77 research outputs found
BuilDiff: 3D Building Shape Generation using Single-Image Conditional Point Cloud Diffusion Models
3D building generation with low data acquisition costs, such as single
image-to-3D, becomes increasingly important. However, most of the existing
single image-to-3D building creation works are restricted to those images with
specific viewing angles, hence they are difficult to scale to general-view
images that commonly appear in practical cases. To fill this gap, we propose a
novel 3D building shape generation method exploiting point cloud diffusion
models with image conditioning schemes, which demonstrates flexibility to the
input images. By cooperating two conditional diffusion models and introducing a
regularization strategy during denoising process, our method is able to
synthesize building roofs while maintaining the overall structures. We validate
our framework on two newly built datasets and extensive experiments show that
our method outperforms previous works in terms of building generation quality.Comment: 10 pages, 6 figures, accepted to ICCVW202
HRVQA: A Visual Question Answering Benchmark for High-Resolution Aerial Images
Visual question answering (VQA) is an important and challenging multimodal
task in computer vision. Recently, a few efforts have been made to bring VQA
task to aerial images, due to its potential real-world applications in disaster
monitoring, urban planning, and digital earth product generation. However, not
only the huge variation in the appearance, scale and orientation of the
concepts in aerial images, but also the scarcity of the well-annotated datasets
restricts the development of VQA in this domain. In this paper, we introduce a
new dataset, HRVQA, which provides collected 53512 aerial images of 1024*1024
pixels and semi-automatically generated 1070240 QA pairs. To benchmark the
understanding capability of VQA models for aerial images, we evaluate the
relevant methods on HRVQA. Moreover, we propose a novel model, GFTransformer,
with gated attention modules and a mutual fusion module. The experiments show
that the proposed dataset is quite challenging, especially the specific
attribute related questions. Our method achieves superior performance in
comparison to the previous state-of-the-art approaches. The dataset and the
source code will be released at https://hrvqa.nl/
Flow-based GAN for 3D Point Cloud Generation from a Single Image
Generating a 3D point cloud from a single 2D image is of great importance for
3D scene understanding applications. To reconstruct the whole 3D shape of the
object shown in the image, the existing deep learning based approaches use
either explicit or implicit generative modeling of point clouds, which,
however, suffer from limited quality. In this work, we aim to alleviate this
issue by introducing a hybrid explicit-implicit generative modeling scheme,
which inherits the flow-based explicit generative models for sampling point
clouds with arbitrary resolutions while improving the detailed 3D structures of
point clouds by leveraging the implicit generative adversarial networks (GANs).
We evaluate on the large-scale synthetic dataset ShapeNet, with the
experimental results demonstrating the superior performance of the proposed
method. In addition, the generalization ability of our method is demonstrated
by performing on cross-category synthetic images as well as by testing on real
images from PASCAL3D+ dataset.Comment: 13 pages, 5 figures, accepted to BMVC202
Unsupervised Domain Adaptation for Multispectral Pedestrian Detection
Multimodal information (e.g., visible and thermal) can generate robust
pedestrian detections to facilitate around-the-clock computer vision
applications, such as autonomous driving and video surveillance. However, it
still remains a crucial challenge to train a reliable detector working well in
different multispectral pedestrian datasets without manual annotations. In this
paper, we propose a novel unsupervised domain adaptation framework for
multispectral pedestrian detection, by iteratively generating pseudo
annotations and updating the parameters of our designed multispectral
pedestrian detector on target domain. Pseudo annotations are generated using
the detector trained on source domain, and then updated by fixing the
parameters of detector and minimizing the cross entropy loss without
back-propagation. Training labels are generated using the pseudo annotations by
considering the characteristics of similarity and complementarity between
well-aligned visible and infrared image pairs. The parameters of detector are
updated using the generated labels by minimizing our defined multi-detection
loss function with back-propagation. The optimal parameters of detector can be
obtained after iteratively updating the pseudo annotations and parameters.
Experimental results show that our proposed unsupervised multimodal domain
adaptation method achieves significantly higher detection performance than the
approach without domain adaptation, and is competitive with the supervised
multispectral pedestrian detectors
A patch-based method for the evaluation of dense image matching quality
Airborne laser scanning and photogrammetry are two main techniques to obtain 3D data representing the object surface. Due to the high cost of laser scanning, we want to explore the potential of using point clouds derived by dense image matching (DIM), as effective alternatives to laser scanning data. We present a framework to evaluate point clouds from dense image matching and derived Digital Surface Models (DSM) based on automatically extracted sample patches. Dense matching errors and noise level are evaluated quantitatively at both the local level and whole block level. In order to demonstrate its usability, the proposed framework has been used for several example studies identifying the impact of various factors onto the DIM quality. One example study proves that the overall quality on smooth ground areas improves when oblique images are used in addition. This framework is then used to compare the dense matching quality on three different terrain types. In another application of the framework, a bias between the point cloud and the DSM generated from a photogrammetric workflow is identified. The framework is also used to reveal inhomogeneity in the distribution of the dense matching errors caused by overfitting the bundle network to ground control points
- …